Search Result

Select

Improved K-anonymity privacy protection algorithm based on different sensitivities

Ran ZHAI, Xuebin CHEN, Guopeng ZHANG, Langtao PEI, Zheng MA

Journal of Computer Applications 2023, 43 (5): 1497-1503. DOI: 10.11772/j.issn.1001-9081.2022040552

Abstract （353）

HTML （9）

PDF （1192KB）（217）

Save

To address the problem that the development of machine learning requires a large number of real datasets with both data security and availability， an improved K-anonymity privacy protection algorithm based on Random Forest （RF） was proposed， namely RFK-anonymity privacy protection. Firstly， the sensitivity of each attribute value was predicted by RF algorithm. Secondly， the attribute values were clustered according to different sensitivities by using the k-means clustering algorithm， and the data was hidden to different degrees by using the K-anonymity algorithm according to the sensitivity clusters of attribution. Finally， data tables with different hiding degrees were selected by different users according to their needs. Experimental results show that in Adult datasets，compared with the data processed by K-anonymity algorithm， the accuracies of the data processed by the RFK-anonymity privacy protection algorithm are increased by 0.5 and 1.6 percentage points at thresholds of 3 and 4， respectively； compared with the data processed by （p，α， k）-anonymity algorithm， the accuracies of the data processed by the proposed algorithm are improved by 0.4 and 1.9 percentage points at thresholds of 4 and 5. It can be seen that RFK-anonymity privacy protection algorithm can effectively improve the availability of data on the basis of protecting the privacy and security of data， and it is more suitable for classification and prediction in machine learning.

Table and Figures | Reference | Related Articles | Metrics

Select

DDoS attack detection by random forest fused with feature selection

Jingcheng XU, Xuebin CHEN, Yanling DONG, Jia YANG

Journal of Computer Applications 2023, 43 (11): 3497-3503. DOI: 10.11772/j.issn.1001-9081.2022111792

Abstract （155）

HTML （3）

PDF （1450KB）（104）

Save

Exsiting machine learning-based methods for Distributed Denial-of-Service （DDoS） attack detection continue to increase in detection difficulty and cost when facing more and more complex network traffic and constantly increased data structures. To address these issues， a random forest DDoS attack detection method that integrates feature selection was proposed. In this method， the mean impurity algorithm based on Gini coefficient was used as the feature selection algorithm to reduce the dimensionality of DDoS abnormal traffic samples， thereby reducing training cost and improving training accuracy. Meanwhile， the feature selection algorithm was embedded into the single base learner of random forest， and the feature subset search range was reduced from all features to the features corresponding to a single base learner， which improved the coupling of the two algorithms and improved the model accuracy. Experimental results show that the model trained by the random forest DDoS attack detection method that integrates feature selection has a recall increased by 21.8 percentage points and an F1-score increased by 12.0 percentage points compared to the model before improvement under the premise of limiting decision tree number and training sample size， and both of them are also better than those of the traditional random forest detection scheme.

Table and Figures | Reference | Related Articles | Metrics

Select

Improved federated weighted average algorithm

Changyin LUO, Junyu WANG, Xuebin CHEN, Chundi MA, Shufen ZHANG

Journal of Computer Applications 2022, 42 (4): 1131-1136. DOI: 10.11772/j.issn.1001-9081.2021071264

Abstract （604）

HTML （16）

PDF （468KB）（289）

Save

Aiming at the problem that the improved federated average algorithm based on analytic hierarchy process was affected by subjective factors when calculating its data quality， an improved federated weighted average algorithm was proposed to process multi-source data from the perspective of data quality. Firstly， the training samples were divided into pre-training samples and pre-testing samples. Then， the accuracy of the initial global model on the pre-training data was used as the quality weight of the data source. Finally， the quality weight was introduced into the federated average algorithm to reupdate the weights in the global model. The simulation results show that the model trained by the improved federal weighted average algorithm get the higher accuracy compared with the model trained by the traditional federal average algorithm， which is improved by 1.59% and 1.24% respectively on equally divided and unequally divided datasets. At the same time， compared with the traditional multi-party data retraining method， although the accuracy of the proposed model is slightly reduced， the security of data and model is improved.

Table and Figures | Reference | Related Articles | Metrics

Select

K-Prototypes clustering method for local differential privacy

Guopeng ZHANG, Xuebin CHEN, Haoshi WANG, Ran ZHAI, Zheng MA

Journal of Computer Applications 2022, 42 (12): 3813-3821. DOI: 10.11772/j.issn.1001-9081.2021101724

Abstract （384）

HTML （5）

PDF （2056KB）（74）

Save

In order to protect data privacy while ensuring data availability in clustering analysis， a privacy protection clustering scheme based on Local Differential Privacy （LDP） technique called LDPK-Prototypes （LDP K-Prototypes） was proposed. Firstly， the hybrid dataset was encoded by users. Then， a random response mechanism was used to disturb the sensitive data， and after collecting the users’ disturbed data， the original dataset was recovered by the third party to the maximum extent. After that， the K-Prototypes clustering algorithm was performed. In the clustering process， the initial clustering center was determined by the dissimilarity measure method， and the new distance calculation formula was redefined by the entropy weight method. Theoretical analysis and experimental results show that compared with the ODPC （Optimizing and Differentially Private Clustering） algorithm based on the Centralized Differential Privacy （CDP） technique， the proposed scheme has the average accuracy on Adult and Heart datasets improved by 2.95% and 12.41% respectively， effectively improving the clustering usability. Meanwhile， LDPK-Prototypes expands the difference between data， effectively avoids local optimum， and improves the stability of the clustering algorithm.

Table and Figures | Reference | Related Articles | Metrics